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Abstract 

In this paper, based on the overlapping domain decomposition 
method (DDM) proposed in [10j, an one step preconditioner is pro¬ 
posed to solve 2D high frequency Helmholtz equation. The computa¬ 
tion domain is decomposed in both x and y directions, and the local 
solution on each subdomain is updated simultaneously in one iteration, 
thus there is no sweeping along certain directions. In these ways, the 
overlapping DDM is similar to the popular DDM for Poisson problem. 

The one step preconditioner simply take the restricted source on each 
subdomain, solve the local problems and summarize the local solutions 
on all subdomains including their PML area. The complexity of solv¬ 
ing the problem with the preconditioner is 0(Nn\t er ), where n^er is the 
number of iteration, and it is shown numerically that n\ t er is propor¬ 
tional to the number of subdomains in one direction. 2D Helmholtz 
problem with nearly a billion unknowns are solved efficiently with the 
preconditioner on massively parallel machines. 
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1 Introduction 


We consider in this paper to solve the Helmholtz equation in the full 
space M 2 , with Sommerfeld radiation condition, 


A u + k 2 u = / 


ri/ 2 (^ — i ku) 
or 


0 


m 


as r = x 


oo 


( 1 ) 


where k is the wave number. 

The domain decomposition method for the Helmholtz equation 
has been studied for years, many different DDMs have been proposed 
based on different boundary conditions at the subdomain interface. 

The DDM for the Helmholtz equation is very natural. Truncated 
with perfect match layer, the local problem on one subdomain could 
be approximately solved, and the local solution is passed to neighbour 
subdomains via interface to carry on the wave propagation process. 
Engquist and Ying mm proposed the sweeping preconditioner by 
approximating the inverse of Schur complements in the LDlY factor¬ 
ization, and the method has been further developed by Liu and Ying 
nil IE] using wave addition and dimension recursion. Wave traveling 
in varying medium generates reflections and refractions, and it’s more 
reasonable that the DDM admits reflection at the interface. While 
the sweeping preconditioner mm use a Dirichlet type interface condi¬ 
tion that does not admit reflection, a few new DDM is proposed with 
a reflective interface condition, such as the source transfer domain 
decomposition method by Chen and Xiang Oil], Stolk’s DDM [ 15] , 
double sweep preconditioner by Vion an Geuzaine [H], and polarized 
trace method by Zepeda m • The source transfer DDM mm ad¬ 
mits reflection only in one direction, recently Du and Wu [6] modified 
the method so that it admits reflection on both directions. Interest¬ 
ingly, we found that the source transfer DDM relates closely to Stolk’s 
DDM [T5J and polarized trace method [IIS], in the way that choosing 
the smoothing function in source transfer DDM to be Heaviside func¬ 
tion would lead to an interface condition that is similar to the ones of 
Stolk’s DDM and polarized trace method. 

The aforementioned domain decomposition methods in the liter¬ 
ature usually partition the domain into slices in one direction and 
sweep from one side to the other, then sweep backwards. Two direc¬ 
tions sweeping happens in a recursive way as in Liu and Ying [12], Du 
and Wu [16]. The serial sweeping order causes difficult in scalability in 
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parallel computing, and it’s impractical to cut too many slices in one 
direction. 

An overlapping DDM is proposed in m- The popular DDM for 
problems other than frequency wave domain problem works in such 
a way that, the domain is decomposed in multiple directions, and in 
one step of iteration, each subdomain takes the information from its 
neighbour subdomain, update its own solution and prepare the infor¬ 
mation to be use by its neighbour subdomains in the next step. The 
overlapping DDM works in the similar way, and there is no sweeping 
along certain direction at all, thus it’s suitable for large scale parallel 
computing. Since the overlapping DDM uses the source transfer type 
technique, the reflections is admitted near the interface. In [10] . for 
three layered medium, the reason is explained why the total wave so¬ 
lution is the summation of all incident, reflected and refracted waves 
on all subdomains, and the convergence of the method is estimated. 
In this paper, we reorganize the overlapping DDM method in a more 
concise way, and an one step preconditioner is proposed. Numerical 
examples are presented to show the preconditioner is simple, effective 
and suitable for parallel computing of high frequency wave problems. 

The numerical result of the preconditioner shows that the time cost 
is smaller without extra overlapping region. However, we still call the 
preconditioner overlapping , since the PML layer of one subdomain 
overlap with its neighbours, and the solution in the PML layer is 
added to the total solution, which is a major difference between this 
method and the popular DDM for Poisson problem. 

The rest of the paper is organized as follows. In section 2, the 
overlapping DDM for Helmholtz equation is reorganized in a concise 
way, and the one step iteration preconditioner is proposed. In section 
3, numerical examples for constant medium, simple layered medium 
and Marmousi model is presented, and the performance of the pre¬ 
conditioner is discussed. 


2 Overlapping domain decomposition 
preconditioner 

The frequence domain wave equations defined on unbounded domain 
could be solved on truncated domain with the perfect matched layer as 
the absorbing boundary condition mm- To solve Helmholtz problem 
(jT]) , the unbounded domain M 2 is truncated to a rectangle domain 
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- 1 [ kr Ipmhlx "I - ^pml] x [ ly ^pmli ly “1“ ^pml]; where /p 1M | is the 

length of PML layer. The uniaxial PML method j5j is used in this 
paper, where the complex coordinate is stretched in x and y direction 

f x :i 

separately, Xj(xj ) = / crj(t)dt, j = 1,2, and the PML medium 

Jo 

property is chosen that aj(t ) = 0 for \t\ < lj, and &j(t ) > 0 in PML 
layer |i| > lj. Then the PML equation on the truncated domain is 

J _1 V • (AVu) + k 2 u = f, in Q, (2) 

where A(x) = diag ( a2 ^ X2 ^ ; , and J(x) = ai(x\)ot 2 {x 2 )• The 

Vai(xi) tt2{X2)J 

operator of the truncated problem ([2]) is denoted C, 

C(u) = J _1 V • (AVu) + k 2 u. (3) 

The total computation domain is D has an interior region [—l x , l x ] x 
[—ly, l y ], which is parted into Nb x x Nb y non-overlapping subdomains. 
Denote A l = 2 l x /Nb x , Xi = —l x + iAl, i = 0, ...,Nb x , and yj = 
—ly + jAl, j = 0,..., Nby, then the non-overlapping subdomains are 
Qij := [xi,x i+ 1 ] x [yj,y j+ 1 ], i = 0,.. .,Nb Xl j = 0,.. ,,Nb y . 

Then each non-overlapping subdomain fijj is extended to overlap¬ 
ping subdomain Qij := [x^o, x^i] X \yj,o,yj,i], where 

if i = 0 
if * > 0 

if i < Nb x — 1 
if i = Nb x — 1 

if j = 0 
if j > 0 

if j < Nb y — 1 

if j = Nby - 1 

and lo is the length of overlapping region. The domain decomposition 
with 5x5 subdomains is demonstrated in Fig [l] 

On each subdomain i\.j, an local problem with PML layer is set 
up that solves wave held Uij with given source fi.j, 

J-jN ■ (Aij'Vuij) + k 2 Uij = fi :j , in D, (4) 

where J~j and Ajj is determined by the PML layer of i\,j. Denote 
the index set of neighbour Mij = {(*', j') | i' E — E {j— 


Xi,0 — 
Xi, 1 = 

yj, o = 

Vj, i = 


^pml? 

Xi lo ^pml? 

Xi-\-\ lo + /pml? 
^z+l “1“ ^pml? 

Vj ^pml? 

Uj lo ^pml? 

Uj -\-1 l O ~\~ /pml? 

Uj +1 “1“ ^pmb 
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(a) (b) 


Figure 1: Domain decomposition of 5x5 subdomains. The thick lines sep¬ 
arates the non-overlapping subdomains 0, ; j, i,j = 1... 5. The shadowed 
areas in (a) are the overlapping subdomain 0 14 , and 0 4j2 , which are 
also shown in (b). 


1 ,3,3 + 1}, and / ( i,j )}, so the subdomain j has neighbour 

subdomains G A/ij. Now the overlapping DDM is stated 

as follows. 

On the first step, solve the subdomain problem on j with source 
restricted to interior region Ojj, and the solution is denoted 


• (A,Wb) + = / b (j > 


On the successive steps, denote the subdomain solution of step s 
as ufj. In each step, solve the subdomain problem on fly with the 

residual of the neighbour subdomains restricted to interior region flj j 
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as source, 


V • (AjVul 


s+1 

3 


) + k 


2 u s+1 


E 

(i',j OeMj 




a- 


Such iteration goes on until the residual is small enough. 
DDM solution is 

“DDM = E E U tj 

S >=0 i,j 


in Qij 
(6) 

And the 


(7) 


The main idea of the overlapping DDM is as follows. On each sub- 
domain f \ : j, the source, e.g. f t .j on cause a local wave field u t .j, 
and the residual := fij — Cuij satisfied that = 0 in klij, and 
fij 7 ^ 0 in PML layer of Djj, thus the residual in PML layer contains 
the wave field information that can be used as incident wave field for 
neighbour subdomains to carry on the wave propagation precess. All 
subdomains solve the local problem in parallel, and send the wave 
information to neighbour subdomains in one iteration. After m iter¬ 
ations, the wave have approximately propagated over m subdomains. 
If there are medium discontinuities in the subdomains, reflections will 
be passed back by the residual in the PML layer in the next iteration. 
Such wave propagation precess goes on during the iteration, and the 
summation of all incident, reflected and refracted waves on all subdo¬ 
mains is the total solution. Detailed discussion for two subdomains 
with three layered medium could be found in m- 

To explain why the overlapping DDM works well for domain de¬ 
composition in both x and y directions, we elaborate on the wave 
information passing from to its neighbor fVy, 

By | 6 ]), the wave information is passed with the residual in the sub¬ 
domain’s PML layer, as shown in Fig [2j-(a). Alternately, the wave 
field in the subdomain’s PML layer could be recovered using only the 
values on incident boundaries, as shown in Fig[2j-(b) and (c), and the 
residual in the PML layer is then recovered. In either case, the wave 
information is passed not only in x direction and y direction, but also 
in the corner direction. The amount of the information passed to cor¬ 
ner neighbour subdomais increases as the length of PML layer or the 
length of extra overlap region increases. 

The overlapping DDM is more effective when using as a precon¬ 
ditioner than a solver. The one step preconditioner is chosen, which 
simply solves the subdomain problem on j with source restricted to 
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Figure 2 : Illustration of the information passing from to its neighbors. 
The subdomain 2 is separated into nine parts that belong to different non¬ 
overlapping subdomains, namely ^42 and > where G .A/4 2- (a) 

The information is contained in the residual in f^’s PML region, (b) The 
information is contained in the incident boundaries, marked with thick lines, 
(c) The information is contained in the incident boundaries, also marked with 
thick lines. 


interior region • 


in fij. 


( 8 ) 


and summarizes the solutions on all subdomains to get the approxi¬ 
mate solution used in the preconditioning procedure , 


«PC = u h 


(9) 


Remark: A smoothing function could be multiplied to the solu¬ 
tion in the PML region to keep the solution in IL 1 (ff), such technique 
is used in [3]. In ([6]), .,) could be substituted with •,), 

where fyj is a smoothing function for subdomain , such that 
/ 3ij G C 2 (M 2 ), , ; = 1, and A'J'|r 2 \q., = 0. Mean while 

in ([7]) and © , the summation over uf ■ could be substituted with 
@i,j u i j- I n °ur numerical experiments, since the remaining value near 
the PML layer outside boundary is negligible, we simply omit the 
smoothing function. 


7 





























3 Numerical experiments 


Three numerical experiments that includes constant medium, simple 
layered medium and Marmousi model are carried out to test the per¬ 
formance of the overlapping DDM. 

Finite difference method with second order accuracy is used to 
discretize the Helmholtz equation. The PML layer is of 30 grid points 
width by default. Single shot in the subdomain flo,o is taken as the 

source, and the position is ( x s ,y s ) where x s = xq + - A l, and y s = 


yo + - Al. The shape of the shot is an approximate delta function, 

O ^ 

fij = - — —5(ih x — x s ,jh y — y s ), where h x , h y are the grid size in x 
and y direction, respectfully. The relative tolerance of linear solver is 

IQ-iO £ or a j£ casegi 

The number of equivalent sweeping, defined as - 1 , 

max{ Nb x ,Nb y \ 

is used to measure the effectiveness of the preconditioner. A source 
at one side of the domain generates the wave that pass to the other 
side, and it takes at least number of subdomains in the direction to 
accomplish the traveling. Thus, we are satisfied if the number of 
equivalent sweeping does not change if number of subdomains grows. 

The Tianhe-2 cluster is used in our numerical experiments, each 
node of the cluster includes two 2.2GHz Xeon E5-2692 processors with 
12 cores. The number of processors in use are Nb x x Nb y . 


3.1 Constant medium 

The overlapping DDM is tested for constant k = 1 on square domain 
[o.i] x [0,1]. First, we fixed the number of subdomains, and increase 
both the problem size and the wave number, to see the effect of in¬ 
creasing frequency to the preconditioner. Note that the number of 
PML layer points n pm i increases as problem size, so that on each sub- 
domain the ratio of the PML area to the total area is fixed. 8x8 
subdomains in x and y direction is used, and on each subdomain the 
local problem is factorized and solved with direct solver. 

The result is shown in table [l] The time cost for factorizing and 
solving local problem on the subdomain increases as the local problem 
size grows. The number of equivalent sweeping is around 5 for different 
frequency, thus the preconditioner is not affected by the frequency of 
the problem in the constant medium case. Actually, the number of 




Size 

n P mi 

Freq 

u/2n 

No. 

Iter 

No. equ 
sweep 

Time 

Fact 

Time 

Iter 

800 2 

30 

71.6 

53 

7 

0.41 

2.53 

1,600 2 

60 

143 

42 

5 

1.39 

7.55 

3,200 2 

120 

287 

37 

5 

6.42 

32.1 

6,400 2 

240 

573 

34 

4 

33.7 

111 

12,800 2 

480 

1147 

31 

4 

192 

440 


Table 1: The performance of the preconditioner for const medium problem, 
with number of subdomains fixed. 


equivalent sweeping decrease a little bit as the problem size grows, 
which is expected since the increasing number of points in PML layer 
leads to better absorption at the subdomain boundary. 

Second, the weak scalability test is preformed. Both the problem 
size and the number of subdomains increase, while the subdomain 
problem size is fixed. Such test exams whether a method is suitable 
for large scale parallel computing. The result is shown in Table [2j 
Since the problem size of each subdomain is fixed, a fixed setup time 
around 2.2s to factorize the local problem is required. The number 
of equivalent sweeping is kept almost unchanged as the total problem 
size increases, however, we found that the number of GMRES restart 
need to grow to maintain such iteration numbers. The number of 
iteration doubles as the problem size in one direction doubles, cause 
the total solving time doubles, and the iteration time is far larger than 
the fixed setup time as the problem grows large. 


Size 

Nb x x Nb y 

Freq 

u}/2n 

GMRES 

Restart 

No. 

Iter 

No. equ 
sweep 

Time 

solve(s) 

600 2 

2 x2 

55 

30 

9 

5 

3.5 

1 ,200 2 

4x4 

105 

30 

23 

6 

7.3 

2,400 2 

8 x8 

205 

30 

52 

7 

16 

4,800 2 

16 x 16 

405 

60 

106 

7 

33 

9,600 2 

32 x 32 

805 

120 

213 

7 

79 

19,200 2 

64 x 64 

1605 

240 

425 

7 

324 


Table 2: The performance of the preconditioner for const medium problem, 
with subdomain problem size fixed. 
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3.2 Layered medium 

The overlapping DDM is tested for simple layered medium problem on 
square domain [0,1] x [-1, 0], where five layers of medium lie nearly 
horizontally, as shown in Fig [3] left. The numerical solution of the 
problem with problem size 1200 2 is shown in Fig [ 3 ] right. 



Figure 3: Left: layered medium velocity profile. Right: numerical solution 
of problem size 1200 2 . 

Again, we fixed the number of subdomains, and increase both the 
problem size and the wave number, to see the effect of increasing 
frequency to the preconditioner. 8x8 subdomains in x and y direction 
is used, and the result is shown in table [3] The number of equivalent 
sweeping is around 10 for different frequency, a bit larger than that of 
constant medium problem in similar test, thus the preconditioner is 
not affected by the frequency of the problem in this case. Similar to the 
result of const medium problem, the number of equivalent sweeping 
decrease a little bit as the problem size grows. 

The weak scalability test is also preformed, and the result is shown 
in Table [4] The number of extra overlapping points Nol = 0,50 is 
tested to evaluate the effectiveness of enlarging overlapping region. 

The fixed setup time to factorize the local problem is around 2.2s 
for Nol = 0 and 3.1s for Nol = 50, respectfully. The number of 
equivalent sweeping is kept almost unchanged as the total problem 
size increase. Enlarging the overlapping region results smaller number 
of iteration, however since the local problem size increases and it take 
longer time to solve the local problem, the total time cost is bigger. 

So in this case, it’s better without extra overlapping region. 
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Size 

n P mi 

Freq 

u/2n 

No. 

Iter 

No. equ 
sweep 

Time 

Fact 

Time 

Iter 

800 2 

30 

35.8 

88 

11 

0.49 

4.70 

1,600 2 

60 

71.6 

83 

10 

1.79 

18.3 

3,200 2 

120 

143 

78 

10 

8.18 

79.4 

6,400 2 

240 

287 

74 

9 

34.0 

231 

12,800 2 

480 

573 

70 

9 

192 

935 


Table 3: The performance of the preconditioner for simple layered model, 
with number of subdomains fixed. 


Size 

Nb x x Nb y 

Freq 

Nol 

GMRES 

No. 

No. equ 

Time 



£ j/2n 


Restart 

Iter 

sweep 

solve(s) 

600 2 

2 x2 

27.5 

0 

30 

19 

10 

4.86 

50 

30 

16 

8 

8.28 

1 ,200 2 

4x4 

52.5 

0 

30 

46 

12 

12.4 

50 

30 

41 

10 

22.8 

2,400 2 

8 x8 

102.5 

0 

30 

88 

77 

11 

24.2 

50 

30 

10 

41.2 

4,800 2 

16 x 16 

202.5 

0 

60 

173 

11 

53.3 

50 

60 

153 

10 

91.9 

9,600 2 

32 x 32 

402.5 

0 

120 

338 

11 

127 

50 

120 

294 

9 

198 

19,200 2 

64 x 64 

802.5 

0 

240 

680 

11 

560 

50 

240 

615 

10 

855 


Table 4: The performance of the preconditioner for simple layered model, 
with subdomain problem size fixed. 

3.3 Marmousi model 

At last, the preconditioner is tested on the 2D Marmousi model in 
seismology, which is 3,000 m deep and 9, 200 m wide. Only P-wave is 
considered, thus elastic wave equation becomes an acoustic equation. 

The velocity profile is shown in Fig[4j the maximum velocity is 5500 
km/s and the minmum velocity is 1500 krn/s. The numerical solution 
of the problem with problem size 4, 275 x 1,425 is shown in Fig [5] right. 

The weak scalability test is also preformed, and the result is shown 
in Table [5] The number of equivalent sweeping is kept almost un- 
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Figure 4: Velocity profile of Marmousi model. 

Re(u) 




Figure 5: Solution of problem size 4,275 x 1,425. 


changed as the total problem size increases. The largest size problem 
has 977,407,500 unknowns, and 4,332 processors is used to solve it. 


Size 

Nb x x Nb y 

Freq 

(jj/2-k 

Restart 

No. 

Iter 

No. equ 
sweep 

Time 

solve 

4,275 x 1,425 

9x3 

63.5 

30 

86 

10 

56.6 

8,550 x 2,850 

18 x 6 

123 

30 

151 

8 

111 

17,100 x 5,700 

36 x 12 

242 

30 

290 

8 

230 

34,200 x 11,400 

72 x 24 

479 

60 

697 

10 

605 

54,150 x 18,050 

114 x 38 

756 

120 

1,123 

10 

1177 


Table 5: The performance of the preconditioner for Marmousi model, with 
subdomain problem size fixed. 
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